NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

In-Context Reinforcement Learning From Suboptimal Historical Data

Dong, Juncheng; Guo, Moyang; Fang, Ethan X; Yang, Zhuoran; Tarokh, Vahid (August 2025, ICML Proceedings)

Transformer models have achieved remarkable empirical successes, largely due to their in-context learning capabilities. Inspired by this, we explore training an autoregressive transformer for in-context reinforcement learning (ICRL). In this setting, we initially train a transformer on an offline dataset consisting of trajectories collected from various RL tasks, and then fix and use this transformer to create an action policy for new RL tasks. Notably, we consider the setting where the offline dataset contains trajectories sampled from suboptimal behavioral policies. In this case, standard autoregressive training corresponds to imitation learning and results in suboptimal performance. To address this, we propose the Decision Importance Transformer (DIT) framework, which emulates the actor-critic algorithm in an in-context manner. In particular, we first train a transformer-based value function that estimates the advantage functions of the behavior policies that collected the suboptimal trajectories. Then we train a transformer-based policy via a weighted maximum likelihood estimation loss, where the weights are constructed based on the trained value function to steer the suboptimal policies to the optimal ones. We conduct extensive experiments to test the performance of DIT on both bandit and Markov Decision Process problems. Our results show that DIT achieves superior performance, particularly when the offline dataset contains suboptimal historical data.
more » « less
Free, publicly-accessible full text available August 15, 2026
Conditional Average Treatment Effect Estimation Under Hidden Confounders

Aloui, Ahmed; Dong, Juncheng; Hasan, Ali; Tarokh, Vahid (July 2025, The 41st Conference on Uncertainty in Artificial Intelligence)

Free, publicly-accessible full text available July 21, 2026
CATE EstimationWith Potential Outcome Imputation From Local Regression

Aloui, Ahmed; Dong, Juncheng; Le, Cat P; Tarokh, Vahid (July 2025, The 41st Conference on Uncertainty in Artificial Intelligence)

Free, publicly-accessible full text available July 21, 2026
In-Context Reinforcement Learning From Suboptimal Historical Data

Dong, Juncheng; Guo, Moyang; Fang, Ethan X; Yang, Zhuoran; Tarokh, Vahid (July 2025, 2025 International Conference on Machine Learning)

Free, publicly-accessible full text available July 13, 2026
Variational Adversarial Training Towards Policies with Improved Robustness

Dong, Juncheng; Hsu, Hao-Lun; Gao, Qitong; Tarokh, Vahid; Pajic, Miroslav (May 2025, The 28th International Conference on Artificial Intelligence and Statistics)

Free, publicly-accessible full text available May 3, 2026
Off-Policy Evaluation for Human Feedback

Gao, Qitong; Gao, Ge; Dong, Juncheng; Tarokh, Vahid; Chi, Min; Pajic, Miroslav (December 2024, The Thirty-Eighth Annual Conference on Neural Information Processing Systems)

Full Text Available
REFORMA: Robust REinFORceMent Learning via Adaptive Adversary for Drones Flying under Disturbances

https://doi.org/10.1109/ICRA57147.2024.10611002

Hsu, Hao-Lun; Meng, Haocheng; Luo, Shaocheng; Dong, Juncheng; Tarokh, Vahid; Pajic, Miroslav (May 2024, IEEE)

Full Text Available
Transfer Learning for Individual Treatment Effect Estimation

Aloui, Ahmed; Dong, Juncheng; Le, Cat P.; Tarokh, Vahid (July 2023, Uncertainty in Artificial Intelligence)

Full Text Available
PASTA: Pessimistic Assortment Optimization

Dong, Juncheng; Mo, Weibin; Qi, Zhengling; Shi, Cong; Fang, Ethan X; Tarokh, Vahid (February 2023, International Conference on Machine Learning)

Full Text Available
PASTA: Pessimistic Assortment Optimization

Dong, Juncheng; Mo, Weibin; Qi, Zhengling; Shi, Cong; Fang, Ethan X; Tarokh, Vahid (February 2023, International Conference on Machine Learning)

Full Text Available

Search for: All records